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Abstract 



We establish a duality between L p -Wasserstein control and L 9 -gradient estimate 
in a general framework. Our result extends a known result for a heat flow on a 
Riemannian manifold. Especially, we can derive a Wasserstein control of a heat flow 
directly from the corresponding gradient estimate of the heat semigroup without 
using any other notion of lower curvature bound. By applying our result to a 
subelliptic heat flow on a Lie group, we obtain a coupling of heat distributions 
which carries a good control of their relative distance. 
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1 Introduction 

There are several ways to formulate a quantitative estimate on rate of convergence to 
equilibrium. By means of functional inequalities, an L 9 -gradient estimate for a heat 
semigroup P t 



has been known to be a very powerful tool. It implies several functional inequalities such 
as Poincare inequalities (when q — 2) and logarithmic Sobolev inequalities (when q — 1), 
which quantify convergence rates (see [2, 4, 5, 21] and references therein). As a different 
approach to this problem, F. Otto [30] discussed a contraction of L p - Wasserstein distance 



for two (linear or nonlinear) diffusions fit,vt of masses when p = 2. His heuristic obser- 
vation based on the geometry of the L 2 - Wasserstein space has been a source of enormous 
developments in the theory of optimal transport (see [36] and references therein). To 
investigate a relation between these formulations makes a connection between different 
approaches and hence it is an interesting problem. M.-K. von Renesse and K.-Th. Sturm 
[37] unified several formulations of this kind for linear heat equation on a complete Rie- 
mannian manifold. As a consequence of their work, (1.1) or (1.2) is shown to be equivalent 
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to the presence of a lower Ricci curvature bound by k (it also holds for k < 0). But, in 
a more general framework, such a sort of duality has been known only when p — 1 and 
q = oo, which is the weakest form for (1.1) and (1.2) both. 

The main result of this paper extends the duality to that between an L 9 -gradient 
estimate and an L p - Wasserstein control for p,q e [1, oo] with p~ l + q~ l = 1 beyond the 
case of a heat flow on a complete Riemannian manifold (see Theorem 2.2 for the precise 
statement). We should emphasize that our duality does not require any other kind of 
curvature conditions. An L°°-Wasserstein control has been used in the literature as a tool 
to show L 1 -gradient estimate in a coupling method for stochastic processes (for instance, 
see [38] and references therein). In the case of heat flows in a complete Riemannian 
manifolds, any construction of a coupling which carries L°°-Wasserstein control relies on 
lower Ricci curvature bounds. In fact, such an argument was used in von Renesse and 
Sturm's work. As a result, their proof employs a lower Ricci curvature bound to deduce 
Wasserstein controls from gradient estimates. Our result enables us to derive Wasserstein 
controls directly from gradient estimates. Such an implication is not known even in the 
case of heat flows on a Riemannian manifold. Furthermore, this is a great advantage 
under the lack of an appropriate notion of lower curvature bounds. 

Our work is strongly motivated by recent development on gradient estimates on a 
Lie group endowed with a sub- Riemannian structure [5, 8, 12, 13, 22, 27]. To explain a 
consequence of our duality, we deal with the 3-dimensional Heisenberg group here. It is the 
simplest example of spaces possessing a non-Riemannian sub-Riemannian structure like a 
flat Euclidean space in Riemannian geometry. But, unlike Euclidean spaces, some results 
[12, 18] indicate that the "Ricci curvature" should be regarded as being unbounded from 
below (in a generalized sense). Nevertheless, L 9 -gradient estimates hold for q G [l,oo] 
with a constant K > 1 instead of e~ kt in (1.1) [5, 12, 13, 22]. We can apply our duality 
to this case to obtain the corresponding L p - Wasserstein control for any p e [l,oo]. In 
the theory of optimal transport on the Heisenberg group, an L 2 -Wasserstein control for 
the heat flow would be important (cf. [17]). In probabilistic point of view, the heat flow 
is described by motions of a pair of the 2-dimensional Euclidean Brownian motion and 
the associated Levy stochastic area. Our L°°- Wasserstein control means the existence 
of a coupling of two particles so that the distance between them at time t is controlled 
by the initial distance almost surely. It is sometimes a complicated issue to construct a 
"well-behaved" coupling in the absence of curvature bounds. Especially, see [9, 20] for 
works on a successful coupling on the Heisenberg group and its extension. Note that our 
formulation also fits with studying a heat semigroup under backward (super-)Ricci flow, 
in which case Wasserstein contractions with respect to a time-dependent distance function 
is shown recently [3, 26]. 

The notion of lower Ricci curvature bound has been extended in many ways. Although 
our result does not need those notions, they should be related since (1.1) and (1.2) are 
analytic and probabilistic characterizations of a lower Ricci curvature bound respectively. 
Here we review two extensions and observe how these are connected with our result. In an 
analytic way, D. Bakry and M. Emery [6] (see also [2] and references therein) extend the 
notion of lower Ricci curvature bound to r 2 -criterion or curvature- dimension condition. 
In an abstract framework where it works, a r 2 -criterion is equivalent to an /^-gradient 
estimate. Note that their notion of gradient is different from ours. But, once these 
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two notions coincide, a ^-criterion becomes equivalent to Wasserstein control with 
the aid of our result. In a sufficiently regular case as diffusions on a manifold, such an 
equivalence is well-known. Our result possibly provides an extension of this equivalence. 
In connection with the theory of optimal transport, convexities of entropy functionals 
are proposed by J. Lott, C. Villani and K.-Th. Sturm [24, 34] as a natural extension of 
lower Ricci curvature bound. Under this condition, the existence of a heat flow and an 
L 2 -Wasserstein control follow in some cases beyond Riemannian manifolds [29, 32] (see 
[14, 36] for the case on a Riemannian manifold). With the aid of Theorem 8 in [32], we 
can apply our duality to show an L 2 -gradient estimate for the heat semigroup. 

The idea of the proof of our main theorem is simple. The implication from a Wasser- 
stein control to the corresponding gradient estimate is just a slight modification of existing 
arguments. The converse is based on the Kantorovich duality. If p = 1, the Kantorovich 
duality becomes the Kantorovich-Rubinstein formula and the problem becomes much 
simpler. In the case p > 1, we employ a general theory of Hamilton- Jacobi semigroup 
developed in [7, 23] to analyze the variational formula. When p = oo, we use an approx- 
imation of p by finite numbers because we are no longer able to apply the Kantorovich 
duality directly. Note that no semigroup property for heat semigroups is required in the 
proof. With keeping such a generality, our duality is sufficiently sharp in the sense that 
the control rate does not change when we obtain one estimate from the other, like the 
same e~ kt appears in (1.1) and (1.2) both. 

The organization of this paper is as follows. In the next section, we introduce our 
framework and state our main theorem. We review the notion of Wasserstein distance 
and gradient there. Our main theorem is shown in section 3. For the proof, we show 
basic properties of Wasserstein distances and summerize recent results on Hamilton- Jacobi 
semigroup there. In section 4, we consider a heat flow on a sub- Riemannian manifold and 
apply our main theorem to these cases. 

2 Framework and the main result 

Let (X, d) be a complete, separable, proper, length metric space. Here, we say that d 
is a length metric if, for every x, y G X, d(x, y) equals infimum of the length of a curve 
joining x and y. Properness means that all closed metric balls in X of finite radii are 
compact. Under these assumptions, there exists a curve joining x and y whose length 
realizes d(x,y) for each x, y (see [10], for instance). We call it minimal geodesic. Let d 
be a continuous distance function on X, possibly different from d. Assume that for any 
x, y G X, there is a minimal geodesic with respect to d joining x and y. We call such a 
curve "(i-minimal geodesic". 

For two probability measures p and v on X, we denote the space of all couplings of p 
and v by H(p, v). That is, tt 6 II(/i, v) means that rr is a probability measure on X x X 
satisfying ix(A x X) = p(A) and n(X x A) = v(A) for each Borel set A. For p G [1, oo] 
and a measurable function p : X x X — > [0, oo), we define p^(p, v) by 



We are interested in the case p = d and p = d. If d^(p, v) < oo, then there always exists 




(2.1) 
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a minimizer of the infimum on the right hand side in (2.1). In addition, d^ satisfies all 
properties of distance function on the space of probability measures though it may take 
the value +00. The same are also true for . These facts are well-known for p G [1, 00) 
and we can show it similarly even when p = 00. It is sometimes reasonable to restrict d^ 
on all probability measures having finite p-th moments in order to ensure d^(fi, u) < 00. 
But, in this paper, we do not adopt such a restriction. Note that, when p < 00, we usually 
call the restriction of d™ the L p -Wasserstein distance. See [35] for more details and a 
proof of these facts. 

Let Cf,(X) be the space of bounded continuous functions on X equipped with the 
supremum norm. Let Cl(X) be the collection of all Lipschitz continuous functions on 
X and C^l{X) := Cf,(X) fl Cl(X). Note that, if we merely say "Lipschitz", it means 
"Lipschitz with respect to d" . For Lipschitz continuity with respect to d, we use the 
expression "<i-Lipschitz" . 

For a measurable function f on X and x G X , we define |Vd/|(x) by 



|Vd/|(x) = lim sup 

r i° 0<d(x,y)<r 



f(x) - f(y) 



d(x,y) 



We set 1 1 Vdf I |oo = suPzex |Vd/|(x). Note that ||Vd/||oc < 00 holds if and only if / G 
C L (X). In addition, for / G C L (X), 



iVd/Hoo = sup 



f(x) - f(y) 



d(x,y) 



(2.2) 



For a pair of measurable functions / and g on X, we say that g is an upper gradient of / 
if, for each rectifiable curve 7 : [0, /] — > X parametrized with the arc-length, we have 



1/(7(0) "/(7(0))|< fg(i(s))ds. 

Jo 



We will use the following fact basic tool. 

Lemma 2.1 ([11, Proposition 1.11], [16, Proposition 10.2]) For f G C L {X), \V d f\ 
is an upper gradient of f . 

We also use the same notations for d. All the properties described above for |Vd/|, 
including Lemma 2.1, are also true for |Vj/|. 

Set £P{X) be the space of all probability measures on X equipped with the topology 
of weak convergence. Let (P x ) xe x be a family of elements in &>{X). Assume that x ^ P x 
is continuous as a map from X to &(X). Then (P x ) x <=x defines a bounded linear operator 
P on C b (X) by Pf(x) := J x f{y)P x {dy). Let P* be the adjoint operator of P. Note that 
P*(^(X))C @>{X) holds. 

For describing our main theorem, we state the following conditions: 

Assumption 1 There exists a positive Radon measure v on X such that 

(i) (X, d, v) enjoys the local volume doubling condition. That is, there are constants 
D, Ri > such that v(B 2r (xj) < Dv(B r (xj) holds for all x G X and r G (0, R\). 
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(ii) (X,d,v) supports a (l,p )-local Poincare inequality for some po > 1. That is, for 
every R > 0, there are constants A > 1 and Cp > such that, for any / G L\ oc (v) 
and any upper gradient g of /, 

/ 1/ - /x,r| <C P r\ [ g"dv) (2.3) 

holds for every x G X and r G (0, P), where / XjT . := v (B r (x))~ 1 J B ^ f dv. 

(hi) P x is absolutely continuous with respect to v for all x G X; P x (dy) = P x (y)v(dy). 
In addition, the density P x (y) is continuous with respect to x. 

Now we are in turn to state our main theorem. 

Theorem 2.2 Suppose that Assumption 1 holds. Then, for any p G [1, oo] 7 the following 
are equivalent; 

(i) For all \i,v G ^{X), 

<(P>,PV)<^Gu,z/). (C p ) 

(ii) When p > 1, for all f G C 6i l(X) and x G X ; 

Iv.-p/K^^Pdv,/!")^)^, ( Gff ) 

where q is the Holder conjugate ofp; 1/p+l/q = 1. Whenp = 1, for all f G C&^JT), 

II VjP/IU < || V*f . (Goo) 

Remark 2.3 We give several remarks on Assumption 1 and Theorem 2.2. 

(i) If Assumption 1 (i) holds, then Assumption 1 (ii) follows once we obtain (2.3) with 
p — 1 for some R > by a well-known argument. See [31, Lemma 5.3.1], for 
instance. The same is true for a (2,2)-Poincare inequality, which yield a (1,2)- 
Poincare inequality. 

(ii) It is shown in [11] that, under Assumption 1 (i) (ii), |V<z/| coincides with an L Po - 
minimal generalized upper gradient gf for those / for which gf is well-defined. This 
fact itself is not used in this article. But, it will be helpful when we apply our main 
theorem to more concrete problems. In fact, the notion of minimal generalized upper 
gradients is regarded as a sort of weak derivative in the theory of Sobolev spaces. 
We can identify these two notions on Euclidean spaces or Riemannian manifolds. 

(hi) Assumption 1 is used only when we show the implication (G q ) (C p ) for p G (1, oo]. 
Thus the rest holds true without Assumption 1. We need Assumption 1 (i) (ii) only 
for employing a property of Hamilton- Jacobi semigroups. To make these facts clear, 
in the rest of this paper, we will mention Assumption 1 when we require it. 

(iv) The duality between (1.1) and (1.2) is resumed by choosing P — P t and d = e~~ kt d. 
The case d is essentially different from d naturally occurs if we consider a heat flow 
under a backward (super-)Ricci flow (see [3, 26]). 
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(v) Obviously (G q ) implies (Gy) for q, q' G [1, oo] with q < q' by the Holder inequality. 
The dual implication (C p ) =>- (C p /) for G [l,oo] with p > p' also holds true 
without using the equivalence in Theorem 2.2 (see Corollary 3.4 below). For a heat 
flow on a Riemannian manifold (i.e. P = P t and d = e _fct ), if (C p ) or (C g ) holds for 
some p G [1, oo], then (C p ) and (C ? ) hold for any p G [1, oo]. At this moment, it is 
not clear that what condition guarantees such a "//-independence" . 



3 Proof of Theorem 2.2 

We begin with showing the implication (C p ) (G q ). 

Proposition 3.1 Suppose (C p ) for p G [l,oo]. Then (G q ) holds for q G [l,oo] with 
p- 1 + q ~ 1 = 1. 

Proof. For x,y G X, take 7^ G n(P x ,P y ) such that H^lp^) = d p w (P x ,P y ). Since 
P, = P*<5, for z G X, (C p ) yields d p w (P x ,P y ) < d™(6 x ,6 y ) = d(x,y). For / G C 6 , L (X), 



|P/(x)-P/(y)| 



< ' \f(z)-f(w)\7T xy (dzdw). 

XxX 



[ fdP X ~ [ fdPy 

Jx Jx 
(i) The case p — 1: (2.2) together with (Ci) implies 

\f(z)-f(w)\n xy (dzdw)< \\V d f\Ld^(P x ,P y )< \\V d f\Ld(x : y). 



XxX 



Hence, by dividing the above inequalities by d(x, y) and by taking supremum in x ^ y, 
the conclusion follows. 

(ii) The case p G (1, oo): Let us define G r : X — > R by 



G>(z) := sup 

meBr(2)\{z} 



/CO - /W 



Set r := d(x,y) 1 ^ 2q ' ) . The Holder inequality and the Chebyshev inequality yield 
/ \f( z ) ~ f(w)\ir xy (dzdw) 

J XxX 

l{o<d(z,w)<r}d(z, w)<ir xy (dzdw) 



1 


/(*) - fW 


1 XxX 


d(z, w) 



+ 



XxX 



\f(z) - f(w)\ l{d(z,w)>r}^xy{dzdw) 



< 





/(*) - fH 


^[ixxX 


d(z, w) 



l{0<d(z,w)<r}K xy (dzdw) | 



1/9 



LP(n xy ) + 



2\\fU\d\\ p LP{wxy) 



— \\^r\\Li(P x ) d p (P x ,Py)-\- 



2\\f\u^(p x ,p y r 



< \\Gr\\ Lq(Px) d(x,y) + 2\\f\\ 00 d(x,y) 



l+(p-l)/2 
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Here the last inequality follows from (C p ). Since lim^-^r = 0, \im y ^ x G r (z) = |Vd/| (z) 
holds. By virtue of |G>(z)| < ll^d/lloc we can a PPly the dominated convergence theorem 
to obtain lim^a. ||G>|| L9 ^ = |||V<2/||| L9 ( P ^. Thus, by dividing the above inequalities by 

d(x, y) and by tending y — > x, the conclusion follows. 

(iii) The case p = oo: (C^) implies d(z,w) < d(x,y) for n xy -a.e. (z,w). Hence we 
have 

/ \f(z) - f{w)\n xy {dzdw) < d{x,y)\\G i( x , y) \W{p x )- 

JXxX 

Thus the proof will be completed by following a similar argument as above. □ 

For the converse implication, first we show two auxiliary lemmas concerning to Wasser- 
stein distances. The first one will be used to deal with L°°-Wasserstein distance. 

Lemma 3.2 Let p : X x X — > [0, oo) be a continuous function. Then lim^oo p^i^p, v) = 
p™(n,v)foranyn,ve#>{X). 

Proof. Note that pf {p,v) is increasing in p by the Holder inequality. Hence C := 
Hindoo p^(p, v) G [0, oo] exists. Take n n G U(p,v) for n G N such that p^(p,u) = 
IIpIIl"(7t ) hold. Since n n G n(/i, u), (vr n )„ G N is tight. Thus there exists a convergent 
subsequence (n nk )ken of (n n ) ne ?q. We denote the limit of ir rik by n^. Take R > and 
n G N arbitrary. Since p A R G Cb{X x X), we have 

Hp a r \\l^) = Hp a R\\ Ln[ ^ k) < Um IMU (7rnfc) = a 

Here the inequality follows from the Holder inequality for sufficiently large k. Thus, as 
R — > oo and n — > oo, we obtain HpII^cx,^) < C. Thus the assertion holds if p^(p, v) = 
oo. When p^(p,v) < oo, we can take n G U(p,v) such that HpII^oo^) < oo. Then 

pj{p,v) < \\p\\ L p(n) < IIpIIl-w Thus C < IIpIIl-w holds - lt y ields C < PZ(^ V ) and 
hence the conclusion holds. □ 

The next one is useful to reduce the problem in a simpler case. 

Lemma 3.3 If (C p ) holds for any pair of Dirac measures, then (C p ) holds for any p,u G 
f J /{X). 

Although this is probably well-known for experts at least when p G [1, oo), we give a proof 
for completeness. 

Proof. First we consider the case p < oo. Given p, v G ^P(X), take 7r G H(p, v) so 
that ||rf||LP(7r) = d p w {p,v). We may assume d^i^p^u) < oo without loss of generality. 
For 1,1,6 1, take P x , y G U{P X , P y ) so that \\d\\ LP{Px y) = d^(P x ,P y ). By Corollary 5.22 
of [36], we can choose {P x , y }x, y ex so that the map (x,y) i— > P X)J , is measurable. Define 
7f G n(P*p, PV) by it (A) := J XxX P x>y (A)n(dxdy). Then (C p ) for Dirac measures implies 

<(P>,PV) < = { I \\d\\ p LP{Pxy) n(dxdy)\ /P < \\d\\„ M = d?(jjL,u). 

Thus the assertion holds. When p = oo, (Coo) for Dirac measures implies (Cy) for Dirac 
measures for any 1 < p' < oo. Thus we obtain (C p >) for any p,v<E £P(X). Hence applying 
Lemma 3.2 for p = d and p = d yields the conclusion. □ 
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By the Holder inequality, (C p ) for Dirac measures yields (Cy) for Dirac measures if 
p' < p. Thus we obtain the following as a by-product of Lemma 3.3. 

Corollary 3.4 (C p ) implies (Cp/) for any p,p' G [1, oo] with p > p' . 

Next we introduce the notion and some properties of Hamilton- Jacobi semigroup, 
which plays an essential role in the sequel. Let L : [0, oo) — > [0, oo) be a convex superlinear 
function with L(0) = 0. Note that L is continuous and increasing. We denote the Legendre 
conjugate of L by L* : [0, oo) — > [0, oo), which is given by L*(z) = sup w>0 [wz — L(w)]. 
For / G C b (X) and t > 0, we define a function Qt/ on X by 



Qt/(z) := inf 



/(!/) + *i 



For convenience, we write Q Q f := /. We call Q t the Hamilton- Jacobi semigroup associated 
with L. Several basic properties of Q t f in an abstract framework are studied in [7, 23]. 
In [23], they assumed X to be compact and L(s) = s 2 . In [7], they assumed / G C L (X). 
Among them, the following are all we need in this paper. 

Lemma 3.5 ([7, Theorem 2.5], [23, Theorem 2.5]) 

(i) mfytx f(y) < Qtf(x) < f(x). In particular, Q t f G C b (X). 

(ii) Q t (QJ) = Q t+S f. 

(iii) Qtf(x) is nonincreasing in t and lim^ Qtf(x) = f(x). 

(iv) Setu(t,x) = Qtf(x). If f G C L (X), then u G C L ((0,oo) x X). Moreover, 



sup 



\u(t,x) - u(s,y)\ 
\t — s\ + d(x, y) 



< llv./ILv^div./nj. 



(v) Suppose Assumption 1 (i) (ii). Then, fort > andv-a.e. x G X , Q t f satisfies the 
Hamilton- Jacobi equation associated with L* : 



lim 

s|0 



Qt+ S f(x) - Qtf(x) 



= L*{\V d Q t f\ (:r)). 



We do not use Lemma 3.5 (i) (ii) in the sequel. But, it explains why we call Qt "semigroup" 
well. Note that Lemma 3.5 (v) is shown in [7, 23] for the subgradient norm instead of the 
gradient norm |V<z/|. Since these two notions coincides -y-almost everywhere in this case 
(see [23, Remark 2.27]), Lemma 3.5 (v) is still valid. 

Finally, we review the Kantorovich duality (see [35, Theorem 1.3] or [36, Theo- 
rem 5.10], for example). For //, v G X and 1 < p < oo, the following duality holds: 



d^(fJ>,v) p = sup jy gdfi- J f di 



f,geC b (X), 

g(y) - f(x) < d(x, y) p for all x, y G X 



sup 

fec b (X) 



f*dfx- f dv 



(3.1) 
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where f*(y) :— inf^gx [f(x) + d(x, y) p \. In particular, when p — 1, (3.1) is written as 
follows: 

d^(fj,, v) = sup 



fec L (x) 
llv/IU^i 



fdn- f dv 

M J M 



(3.2) 



This is so-called the Kantorovich-Rubinstein formula (see [35, Theorem 1.14] or [36, Par- 
ticular Case 5.16]). 

Remark 3.6 An observation on the proof in [36] tells us that the latter supremum in 
(3.1) can be approximated by elements in C^l{X). Actually, in that proof, there appears 
a sequence of pair of functions 4>k,ipk G Cfe(A) approximating the former supremum in 
(3.1) by taking / = ipk,9 = <f>k- We can easily verify ip k G C^^X) and that (ipk)keN also 
approximates the latter supremum in (3.1). Moreover, we can assume that each element 
of approximating sequence has a compact support without loss of generality, thanks to 
the tightness of /i, v and the properness of X. 

Now we are in position to complete the proof of Theorem 2.2. 

Proposition 3.7 Suppose that Assumption 1 holds. Then (G q ) implies (C p ) forp,q G 
[1, oo] with p^ 1 + q^ 1 = 1. 

Proof. By virtue of Lemma 3.3, it suffices to show (C p ) for /i = 5 X , v = 5 y , x ^ y. Take 
a (i-minimal geodesic 7 : [0, 1] — > X from y to x, which is re-parametrized to have a 
constant speed. Here "constant speed" means d{^ s ,^ t ) = \s — t\d{x,y). Note that, by 
(G q ), Pf is rf-Lipschitz continuous if / G Cl(X). 

(i) The case p — 1: The Kantorovich-Rubinstein formula (3.2) yields 

dY(P x ,P y )= sup [Pf(x)-Pf(y)\. (3.3) 

fec L (x) 

For / G Cl(X), we can apply Lemma 2.1 to Pf. Thus (G^) yields 

\Pf(x) - Pf(y)\ < f d(X ' y) |VjP/| (ls)ds < HVd/IL d(x,y). 

Combining this estimate with (3.3), the conclusion follows. 

(ii) The case 1 < p < 00: Let be the Hamilton- Jacobi semigroup associated with 
L(s) := p~ 1 s p . Note that its Legendre conjugate L* is computed as L*(s) = q~ 1 s q . By 
(3.1) and Remark 3.6, we have 

dW(P x ,P y y= sup [P(/*)(s)-P/(y)]=p sup [PQi/(x) - P/(y)] . (3.4) 

/ec 6 ,z,(x) /ec b , L (x) 

To obtain an integral expression of the term in the above supremum (see (3.5) below), we 
give some estimates. (G q ) and Lemma 3.5 (iv) yield 

\v s PQ s f\(z) < lllVdQ./iii^p,, < llv./iLvrdiv./nj 
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for s > and z G X. Thus Lemma 2.1 and Lemma 3.5 (iv) imply 



PQ t+s f{ lt+s ) - PQ s f{ ls 



< 



< 



PQt + sf{lt + s) - PQt + sf{ls 



Qt+sf — Q s f 



X 



t 



dP^ 



d(x,y) 



t+s 



\V i PQ t+s f\{ lu )du + 



x 



Qt+sf — Qsf 



t 



dP^ 



< (l + rf>,y))(||V d /|| 0O VL*(||V d /||J) 



for s > 0. It means that PQsfijs) is Lipschitz continuous as a function of s G [0,1]. 
Hence there exists a derivative d s (PQ s f(^ s )) for a.e.s G [0, 1] and we have 



PQ 1 f(x)-Pf(y) = f d s {PQ s f{ ls ))ds. 
Jo 



(3.5) 



Let s G (0, 1) be a point where PQ s f(^ s ) is differentiable. It implies 

s,{PQ,fy,)) = lim PQ^fh,^- PQJM 



(3.6) 



We have 

PQs+tf{ls+t) - PQsfjls 

t 



Qs+tf-Q s f dp | PQ s f{is + t)-PQsf{is) _ (37) 
t t 



By Lemma 2.1 together with (G q ), 

PQJh.«)- PQJd.) < #^)^ + ' {(P( | V(igs/n(7ii)}I /, A , (3 8) 



By virtue of Assumption 1 (iii), the Fatou lemma together with the boundedness of 
\VdQtf\ implies that (P\ VdQ s f\ 9 )(lu) is upper semi-continuous in u. Thus (3.8) yields 

,. PQsf(ls+t) - PQsf(ls) . 7/ MUTT /Ml 

hmsup < d(x,y) \\\V d Q a f\\\ Lq(p y 

Ho t v ls > 

For the first term in (3.7), Lemma 3.5 (iii) implies the integrand is nonpositive. Thanks 
to Assumption 1 (i) (ii), Lemma 3.5 (v) is applicable to the integrand. Thus the Fatou 
lemma together with Assumption 1 (iii) yields 



lim sup 

tj.o Jx 



Qt+sf — Qsf 
t 



dP ls+t = lim sup 



tj.0 Jx 



Qt+sf {z) - QJ(z) 



h 

Jx 



< / lim sup 



t[0 



Q t+S f{z)-Q s f{z) 
t 



ls+t 



Is+t 



[z)v(dz) 



[z)v(dz) 



~ [ L*(\V d Q s f\(z))P Js (z)v(dz). 
Jx 



(3.9) 
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Combining (3.7), (3.8) and (3.9) with (3.5) and (3.6), 

PQ 1 f(x)-Pf(y) < f (d(x,y) d Q s f\\\ Lq(P ^ - L* d Q s f\\\ Lq{P ^) ds 

J 

<L(d(x,y)), 

where the second inequality comes from the definition of L* as the Legendre conjugate. 
Substituting this estimate into (3.4), we obtain the desired estimate. 

(iii) The case p = oo: Since (G q ) holds with q — 1, the Holder inequality implies 
(G q ) for any q > 1. Thus we obtain (C p ) for any 1 < p < oo. Therefore, by virtue of 
Lemma 3.2, the conclusion follows by tending p to oo in (Cp). □ 

Remark 3.8 Our duality between L p and L q can be extended to a similar one between 
Orlicz norms. In fact, there are Holder- type inequalities (see [1], for instance) which will 
be used in the implication (i) =^ (ii). For the converse, all properties of Hamilton- Jacobi 
semigroup we will use in the proof still hold in such a generality. 

Remark 3.9 If (C p ) holds with p > 1, then we obtain the following slightly stronger 
version of (Goo); for any / G C^^X) and x G X, 

|VjP/|(x)<|||V (i /||| LOO(Px) . (G'J 

As we have seen in the proof of Proposition 3.7, a weaker condition (Goo) is sufficient to 
obtain (C\). At this moment, the author does not know any example that (C p ) holds only 
for p = 1 and (G'^) fails. 



4 Applications 

In a class of sub-Riemannian manifolds, L 9 -gradient estimates of a subelliptic heat semi- 
group is shown recently by an analytic method. In these cases, we can obtain the corre- 
sponding L p -Wasserstein control via Theorem 2.2 though their notion of gradient looks 
different from ours. To explain how we deal with it, we will demonstrate a general frame- 
work of sub-Riemannian geometry generated by a family of vector fields. We refer to 
[16, 28, 33] for details. 

Throughout this section, we assume X to be a finite dimensional, cr-compact, con- 
nected, smooth differentiable manifold. Consider a family of vector fields {Xi, ■ ■ ■ ,X n } 
on X. We assume that {Xj(a;)}™ =1 is linearly independent on T X X for all x G X and 
that {Xj}™ =1 satisfies the Hormander condition. The latter one means that there exists a 
number m such that the family of vector fields generated by {Xj}" =1 and their commu- 
tators up to the length m spans T X X for each x G X. Let H C TX be the subbundle 
generated by {Xj}™ =1 ; H x := Span {Xi(x), . . . , X n (x)}. We define a metric on H such 
that {Xj(x)}™ =1 becomes an orthonormal basis of 7i x for x G X. We are interested in the 
case H ^ TX. Associated with this metric, we define a function d on X as follows. We 
say a piecewise smooth curve 7 : [0, 1] — > X horizontal if ^(t) G T~C 7 (t) for every t where 7 
is differentiable. For x,y G X, we define d(x,y) by 



d(x,y) :=i*f{jf' \m)\\H l(t) dt 



7 : [0, 1] — > X horizontal curve, 
7(0) = x, 7(0 = y 
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By the Chow theorem, the Hormander condition ensures that d(x,y) < oo for x,y G X. 
As a result, the function d : X x X — > [0, oo) becomes a distance. It is called the 
Carnot-Caratheodory distance. Note that the topology determined by d coincides with 
the original one on X. We assume that (X, d) is complete. 

Let v be a Borel measure on X such that its restriction on each local coordinate has 
a smooth density with respect to the Lebesgue measure associated with the coordinate. 
Let A n := ^2i =1 X*Xi/2 be the sub-Laplacian associated with {X;}™ =1 and v. Here 
X* is the adjoint operator of X; with respect to v. By the completeness of d, A n is 
essentially selfadjoint (see [33]). Take the selfadjoint extension of A n (also denoted by 
Ay) and consider the associated heat semigroup P t = exp(tA^/2). By the hypoellipticity 
of A n , P t has a smooth density function with respect to v. In particular, P t becomes 
a Feller semigroup. We assume that P t is conservative, i.e. P t l = 1. For a smooth 
function / : X — > R, we define the carre du champ operator r(/) : X — > R by 



An L 9 -gradient estimate for P t associated with T is formulated as follows; given q G 
[1, oo), there exists -ft^(t) > for each t > such that, for any / G C%°(X), 



where C%°(X) is the set of all smooth functions / : X — > R with compact supports. As 
we see in the following, (4.1) implies our gradient estimate. 

Proposition 4.1 (4.1) for f G Cf{X) implies (G q ) for P — P t , d — K q (t)d and any 
f G Cl(X) with a compact support. 

Proof. First we extend (4.1) for / G Cb,L(X). By virtue of Corollary 11.8 of [16], 
for / G Cb,L(X), the distributional derivatives {Xif}f =1 are represented as a bounded 
functions and [r/l^ 2 < HVd/H^ holds ^-almost everywhere. Moreover, Theorem 11.7 
of [16] implies [T/l 1 ^ 2 < gj for any upper gradient gj. In particular, Lemma 2.1 implies 
I r/ 1 1/2 < |V d /|. Though they discussed the case that X is an open subset of a Euclidean 
space in [16], we can extend it to our case with the aid of a partition of unity. By a 
mollifier argument together with use of a partition of unity again, we can take a sequence 
fk G C^°(X) such that fk — > / and Tf k — > Tf almost surely (cf. [16, Theorem 11.9]). 
Thus (4.1) holds for any / G C& ; l(X) with a compact support. 

Note that |r/| 1//2 is an upper gradient if / G C°°(X) (see [16, Proposition 11.6], for 
instance). Since P t f G C°°(X) in our case, for a minimal geodesic 7 joining x and y, 



Hence the conclusion follows by dividing the above inequality by d(x, y) and by letting 



r(/)(x) = $X 1 \Xif(x)\ 2 . 



Y{P t f){x)^<K q {t){P t {T{f)^) (x)} 



(4.1) 




y 



X. 



□ 
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Remark 4.2 If we suppose Assumption 1 (i) (ii) in Proposition 4.1, then Theorem 6.1 of 
[11] asserts that the minimal generalized upper gradient of / coincides with |V/| almost 
everywhere. Since the first part of the proof of Proposition 4.1 implies that [r/l 1 ^ 2 is the 
minimal generalized upper gradient for / e Cl(X) with a compact support, the proof 
can be completed there in this case. 

As far as the author knows, (4.1) is established in the following cases; 

• The case q — 1 with K\(t) = K for some K > on groups of type H [13] (including 
the Heisenberg group of arbitrary dimension, see [5, 22] also). 

• The case q > 1 on an arbitrary Lie group [27]. Especially, K p (t) = K p for some 
K p > if it is nilpotent. 

• The case q > 1 with K q (t) = if,e _ * for some K q > on SU(2) [8]. 

In all these cases, v is chosen to be a right-invariant Haar measure and hence the associated 
sub-Laplacian is of the form = Y17=i -^-f- conditions in Assumption 1 hold in these 
cases. For (iii), we have already observed. By the homogeneity of the space, we can reduce 
the assertion in the case of a Euclidean domain (see Remark 2.3 also). Thus (i) and (ii) 
with p = 1 follow from Theorem 11.19 and Theorem 11.21 of [16]. Note that (4.1) is 
shown on a wider class of functions than C^°(X) in some cases. But it is not necessary 
for our purpose. 

Combining Proposition 4.1 with Theorem 2.2 in these cases, we obtain (C p ) for P — P t 
and d = K q (t)d. Though / is restricted to have a compact support in Proposition 4.1, it 
is sufficient to show (C p ) (see Remark 3.6). 

The following simple examples explain a probabilistic meaning of these consequences. 



Example 4.3 The 3-dimensional Heisenberg group is realized on M 3 with the multipli- 
cation defined by 

(x, y, z) ■ (x f , y', z) = (x + x',y + y', z + z' + ^(xy' - yx'^j . 

The Lebesgue measure v on M 3 is a bi-invariant Haar measure. Let us define left-invariant 
vector fields X, Y and Z by 

d y d d x d „ d 
X •— y ■= I Z ■= 

dx 2dz : dy 2 dz' dz 

Set Ti := Span{X, Y}. Then the diffusion process {Bf} t > associated with = 
(X 2 + Y 2 )/2 starting at x = (x, y, z) e M 3 is given by 

Bf := [x + W t {1 \ y + W t {2) , z + \ j\x + W^)dW^ - (y + Wf^j , 

where (W t {1 \ w} 2) ) is a Brownian motion on M 2 . It means that the diffusion process 
associated with A^/2 is given by the 2-dimensional Euclidean Brownian motion and the 
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associated Levy stochastic area. The corresponding heat semigroup is given by P t /(x) = 
E [/(Bf)] for / E C b (X). In this framework, (4.1) for q = 1, P = P t and K^t) = K is 
shown in [5, 22]. Thus we obtain (Coo). It means that, for each t > and x,y 6 R 3 , there 
exists a coupling (Bf , Bf ) of Bf and Bf such that 

d(B?,Bf) <tfd(x,y) (4.2) 

holds almost surely. Here d is the Carnot-Caratheodory distance associated with 7Y. In 
this case, it is known that d is equivalent to the so-called Koranyi distance. That is, there 
exist constants Ci, C 2 > such that, for any x = (x, y, z), y = (x' : y', z') G IR 3 , 

Cid(x, y) < | ((* - a;') 2 + (y - l/) 2 )' + (*-*' + ^(V - yx')) ' | < C 2 d(x, y). 
Thus (4.2) is also interpreted in terms of the Koranyi distance. 

Remark 4.4 In Example 4.3, (CxO provides only a coupling of Bf and Bf for each 
fixed t > 0. When X is a Riemannian manifold, (Coo) holds if and only if there exists a 
coupling (B* , Bf ) t > of two Brownian motions (B*) t > and (Bf ) t > starting from x and 
y respectively such that (4.2) holds for every t > with K = e~ kt almost surely (see 
[37], for instance). In Example 4.3, it is not clear whether a similar result holds or not. 
Actually, in Riemannian case, the fact that the constant e~ ht is multiplicative in t > 
plays a prominent role to construct a coupling of Brownian motions from a control of 
their infinitesimal motions. As observed in [12], we cannot expect such a multiplicativity 
in the case of Example 4.3. 

Example 4.5 On introduce a structure of nilpotent Lie group of step 

2 as follows; for x = ((x^ =1 ; (%)i<,<,<„), y = ((^)? =1 ; (4-)i<i<i<„) G M n x R^n-m^ 



x ■ y = I {Xi + x' v ' 



?j)j =1 ; (^Z'ij + + ^(xiXj XjXj^j 



l<i<j<n. 



As in Example 4.3, the Lebesgue measure v on M n x M n(n becomes a bi-invariant Haar 
measure. Let us define left-invariant vector fields {Xj}™ =1 and {^}i<j<j< n by 

X, ■= — - V + V Xj ' 8 Z v ■= — 

(9xj ^ 2 (9z,j ^ 2 ' u ' ' 

1 i<j<n rt l<j<i iJ %3 

Set H := Span{Xj}™ =1 . The diffusion process {Bf} t >o associated with the sub-Laplacian 
A H /2 = J2i=i X i / 2 starting at x = ({^}™ =1 ; {%}™ =1 ) eR n x W 1 ^' 1 )/ 2 is given by 



B^ = f (x t + ^ W )" =i ; + \ j\xi + W®)W® - ( Xj + W^)dW^ 



l<i<j<n. 



We can easily verify that this group is of type H only if n — 1 (see Corollary 1 of [19], 
for example). But it is still in the framework of [27]. Thus, for each p G [l,oo), there 
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is a constant K p > such that, for any pair x, y G M n x M n(n x ^ 2 , there is a coupling 
(Bf , Bf ) of Bf and Bf satisfying 

E[rf(Bf,Br) p ] 1/p <^(x,y). (4.3) 

Finally, we give a remark that a different kind of coupling of this process is studied by 
Kendall [20]. He showed the existence of a successful coupling. As mentioned there, 
studying a coupling of this process has a possibility of a future application to rough path 
theory [15, 25]. 
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